Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call .wait_tensor() in compiled region for dist.Work created in eager region #2485

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Microve
Copy link
Contributor

@Microve Microve commented Oct 13, 2024

Summary:
In compiled region, instead of calling dist.Work.wait(), we will call torch.ops._c10d_functional.wait_tensor() on the dist.Work's output tensor. This way, we can capture the wait_tensor() op within the torch.compile graph (instead of graph-breaking on dist.Work.wait()), and the tensor will be waited on properly within the graph.

This diff also depends on pytorch/pytorch#137763 to function properly.

Differential Revision: D64275115

… region

Summary:
In compiled region, instead of calling `dist.Work.wait()`, we will call `torch.ops._c10d_functional.wait_tensor()` on the dist.Work's output tensor. This way, we can capture the `wait_tensor()` op within the torch.compile graph (instead of graph-breaking on `dist.Work.wait()`), and the tensor will be waited on properly within the graph.

This diff also depends on pytorch/pytorch#137763 to function properly.

Differential Revision: D64275115
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 13, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D64275115

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants